fix: scope get_full_cu_seqlens cache key by device and inference mode#2728
fix: scope get_full_cu_seqlens cache key by device and inference mode#2728DmCarpe93 wants to merge 8 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR fixes a cache-collision bug in
Confidence Score: 5/5Safe to merge — minimal, targeted fix with no regressions and direct test coverage. No P0/P1 findings. The change is a two-line key extension that directly addresses the described bug. The torch.device and boolean values used as key components are correctly hashable and produce stable equality. Tests cover both scenarios mentioned in the PR description. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["get_full_cu_seqlens called"] --> B{ONNX export mode?}
B -- Yes --> C["Skip cache, return directly"]
B -- No --> D["Read torch.is_inference_mode_enabled"]
D --> E["Build 4-tuple cache lookup: batch+seqlen+device+inference_flag"]
E --> F{Found in cache?}
F -- Yes --> G["Return cached cu_seqlens tensor"]
F -- No --> H["Create new tensor via torch.arange"]
H --> I["Store in cache and return"]
Reviews (7): Last reviewed commit: "Merge branch 'main' into fix/get_full_cu..." | Re-trigger Greptile |
|
@cyanguwa When you have a moment, could you please take a look at this PR? Thanks:) |
|
@cyanguwa This PR is pretty straightforward. Would you mind taking a quick look? Thank you:) |
|
@cyanguwa Hi:) could you look into this PR? thank you. |
Description
Fixed an issue where the cu_seqlen tensor was incorrectly retrieved from the cache.
(batch_size, max_seqlen)were used as the cache key when retrieving cu_seqlens.(batch_size, max_seqlen)is used.Type of change
Changes
Checklist: